BHP - Discovery analysis of table actor¶

Remove unwanted rows ('[à identifier]')¶

Rows number before filter: 61556
Rows number after filter: 59625 (1931 has been removed)

Only take 104 actor type¶

Number of not 104 actors: 3
pk_actor concat_actr concat_standard_name begin_year certainty_begin notes_begin end_year certainty_end notes_end gender_iso notes fk_abob_type_actor creator creation_time modifier modification_time concat_names standard_text_property count_text_property
10340 59031 Actr59031 Forster, James 1830.0 3 3 1930.0 3 3 1 None 106.0 81.0 2016-11-29 11:05:00.060 81.0 2016-11-29 11:05:00 forster, james Forster, James <p>Personnage de Jules Verne dans Le Tour du m... 1
28956 60660 Actr60660 Valjean, Jean 1769.0 1 None 1833.0 1 None 1 None 106.0 122.0 2018-10-23 16:48:50.050 122.0 2018-10-23 16:48:50 valjean, jean Valjean, Jean <p>Personnage de fiction, héros du roman "Les ... 1
46023 46914 Actr46914 Dieu (conception chrétienne) NaN 1 None NaN None None 0 None 106.0 3.0 2013-07-04 11:43:15.990 3.0 2013-12-18 15:24:16 dieu (conception chretienne) Dieu (conception ... <p>La divinité dans les religions chétiennes</p> 1

Column list¶

pk_actor
concat_actr
concat_standard_name
begin_year
certainty_begin
notes_begin
end_year
certainty_end
notes_end
gender_iso
notes
fk_abob_type_actor
creator
creation_time
modifier
modification_time
concat_names
standard_text_property
count_text_property

Missing data analysis¶

Total NaN number: 216970 ( 19.15%)
Columns number with nan: 19
Column <creator> misses 3 (  0.01%) values
Column <gender_iso> misses 22 (  0.04%) values
Column <modifier> misses 5308 (  8.90%) values
Column <certainty_begin> misses 5606 (  9.40%) values
Column <certainty_end> misses 8629 ( 14.47%) values
Column <begin_year> misses 11076 ( 18.58%) values
Column <standard_text_property> misses 19012 ( 31.89%) values
Column <end_year> misses 30214 ( 50.68%) values
Column <notes_begin> misses 40373 ( 67.71%) values
Column <notes_end> misses 43166 ( 72.40%) values
Column <notes> misses 53561 ( 89.83%) values

Column concat_standard_name¶

Number of unfilled: 0
Number of empty   : 0

Column gender_iso¶

Column content:

4 unique values:
 80.02%: "1" ==> 47709
 16.96%: "2" ==> 10113
  2.98%: "0" ==> 1778
  0.04%: "None" ==> 22

As the ISO says, I replace 'None' content with 0.

Column certainty_begin¶

4 unique values:
 63.03%: "1" ==> 37581
 23.32%: "3" ==> 13903
  9.40%: "None" ==> 5606
  4.25%: "2" ==> 2532

Column begin_year¶

848 unique values:
 18.58%: "None" ==> 11076
  1.97%: "1825" ==> 1175
  1.92%: "1860" ==> 1144
  1.18%: "1850" ==> 703
  0.99%: "1870" ==> 589

Column certainty_end¶

4 unique values:
 63.03%: "1" ==> 37581
 23.32%: "3" ==> 13903
  9.40%: "None" ==> 5606
  4.25%: "2" ==> 2532

Column end_year¶

819 unique values:
 50.68%: "None" ==> 30214
  1.93%: "1901" ==> 1152
  0.27%: "1931" ==> 162
  0.27%: "1680" ==> 161
  0.27%: "1650" ==> 158

Column creation_time¶

Column creator¶